Tag
22 articles
Microsoft releases a tutorial for running its Fara browser-use agent in Google Colab using a mock OpenAI-compatible endpoint, enabling developers to experiment with autonomous web browsing.
Google's new Gemini Spark AI agent demonstrates impressive capabilities but raises concerns about privacy risks and financial costs. The technology's 24/7 operation and data access requirements create significant tradeoffs for users.
Oppo open-sources X-OmniClaw, an Android AI agent that uses camera, screen, and voice locally to automate tasks without leaving the phone.
Google is testing Remy, a new AI personal agent for Gemini, designed to carry out tasks autonomously for users. The tool is currently limited to internal staff but signals a shift toward more proactive AI assistance.
Microsoft launches a specialized AI agent in Word designed for legal teams, focusing on contract review, document editing, and negotiation tracking. The tool aims to streamline legal workflows by using structured workflows derived from actual legal practice.
Learn how ml-intern, an open-source AI agent from Hugging Face, automates the complex post-training workflow for large language models, making AI research faster and more accessible.
This article explains the OpenClaw AI architecture that powers always-on smart glasses, detailing how it enables continuous perception and adaptive task execution in real-world environments.
Emergent has launched Wingman, an autonomous AI agent designed to empower non-technical users to manage daily applications through natural language commands.
Microsoft is developing a new AI agent similar to OpenClaw but with enhanced enterprise security features. The move positions Microsoft to address the risks associated with open-source AI agents while meeting enterprise demand for secure AI solutions.
This explainer explores Zhipu AI's GLM-5V-Turbo, a multimodal AI model that translates design mockups into executable code, showcasing advancements in computer vision, natural language processing, and automated code synthesis.
AI2's new open-source web agent MolmoWeb navigates the web using only screenshots, outperforming larger proprietary systems despite its small size.
Researchers are exploring how to build vision-guided web AI agents using the MolmoWeb-4B model, which interprets screenshots to navigate and interact with websites without relying on HTML parsing.